High-Level Synthesis for Nested Loop Kernels with Non-Uniform Dependencies
نویسندگان
چکیده
In high-level synthesis, parallelization for nested loop kernels has been hard due to their complex data dependencies, especially non-uniform dependencies. In this paper, we propose a new method to synthesize a parallelized circuit from such kernels using polyhedral optimization, which has been vigorously studied in the software field. The key point of our contribution is a buffering method for parallel RAM accesses. The experimental result shows that the parallelized circuit with 8 PEs is 5.73 times faster than the sequential one.
منابع مشابه
Automatic Hardware Synthesis of Nested Loops Using UET Grids and VHDL
This paper considers the automatic synthesis of systolic architectures from nested loop algorithmic specifications. The high level input is given in the form of uniform dependence loops with unit dependencies and the target architecture is a multidimensional systolic array with unbounded number of cells. A complete methodology for the hardware synthesis of the resulting architecture, based on V...
متن کاملExtracting data flow information for parallelizing FORTRAN nested loop kernels
Thesis Abstract Currently available parallelizing FORTRAN compilers expend a large amount of eeort in determining data independent statements in a program such that these statements can be scheduled in parallel without need for synchronisation. This thesis hypothesises that it is just as important to derive exact data ow information about the data dependencies where they exist. We focus on the ...
متن کاملAutomatic Parallelization of Non-uniform Dependences
This report summarizes our current experiences with Automatic Program Parallelization tools for converting sequential Fortran code for use on a multiprocessor computer. A number of such tools were evaluated, including Parafrase, Adaptor, PAT, Petit and the SUIF compiler package. We evaluated the suitability of such tools for parallelizing Computational Fluid Dynamics code supplied by the Army R...
متن کاملAn Optimized Three Region Partitioning Technique to Maximize Parallelism of Nested Loops With Non-uniform Dependences
There are many methods for nested loop partitioning exist; however, most of them perform poorly when they partition loops with non-uniform dependences. This paper proposes a generalized and optimized loop partitioning mechanism which can exploit parallelism in nested loops with non-uniform dependences. Our approach based on the region partitioning technique divides the loop into variable size p...
متن کاملEFFICIENT LOOP SCHEDULING AND PIPELINING FOR APPLICATIONS WITH NON-UNIFORM LOOPSy
Using parallel processing systems to compute scientific applications is one of the most common solutions for achieving more efficient computing performance. In some applications such as fluid mechanics, structural analysis, solid state simulations, the dependencies across iterations (loop-carried dependencies) of the computation of array elements may be constants (uniform) or functions of array...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013